High Performance LDA through Collective Model Communication Optimization
نویسندگان
چکیده
LDA is a widely used machine learning technique for big data analysis. The application includes an inference algorithm that iteratively updates a model until it converges. A major challenge is the scaling issue in parallelization owing to the fact that the model size is huge and parallel workers need to communicate the model continually. We identify three important features of the model in parallel LDA computation: 1. The volume of model parameters required for local computation is high; 2. The time complexity of local computation is proportional to the required model size; 3. The model size shrinks as it converges. By investigating collective and asynchronous methods for model communication in different tools, we discover that optimized collective communication can improve the model update speed, thus allowing the model to converge faster. The performance improvement derives not only from accelerated communication but also from reduced iteration computation time as the model size shrinks during the model convergence. To foster faster model convergence, we design new collective communication abstractions and implement two Harp-LDA applicatons, “lgs” and “rtt”. We compare our new approach with Yahoo! LDA and Petuum LDA, two leading implementations favoring asynchronous communication methods in the field, on a 100-node, 4000-thread Intel Haswell cluster. The experiments show that “lgs” can reach higher model likelihood with shorter or similar execution time compared with Yahoo! LDA, while “rtt” can run up to 3.9 times faster compared with Petuum LDA when achieving similar model likelihood.
منابع مشابه
HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency
Latent Dirichlet Allocation (LDA) is a widely used machine learning technique in topic modeling and data analysis. Training large LDA models on big datasets involves dynamic and irregular computation patterns and is a major challenge to both algorithm optimization and system design. In this paper, we present a comprehensive benchmarking of our novel synchronized LDA training system HarpLDA+ bas...
متن کاملA Survey of Methods for Collective Communication Optimization and Tuning
New developments in HPC technology in terms of increasing computing power on multi/many core processors, high bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platfo...
متن کاملOptimization of Collective Communications in HeteroMPI
HeteroMPI is an extension of MPI designed for high performance computing on heterogeneous networks of computers. The recent new feature of HeteroMPI is the optimized version of collective communications. The optimization is based on a novel performance communication model of switch-based computational clusters. In particular, the model reflects significant non-deterministic and non-linear escal...
متن کاملPerformance Characterisation of Intra-Cluster Collective Communications
Although recent works try to improve collective communication in grid systems by separating intra and intercluster communication, the optimisation of communications focus only on inter-cluster communications.We believe, instead, that the overall performance of the application may be improved if intra-cluster collective communications performance is known in advance. Hence, it is important to ha...
متن کاملOPTIMIZATION-BASED MONITORING-SUPPORTED CALIBRATION OF A THERMAL PERFORMANCE SIMULATION MODEL
Building performance simulation is being increasingly deployed beyond the building design phase to support efficient building operation. Specifically, the predictive feature of the simulation-assisted building systems control strategy provides distinct advantages in view of building systems with high latency and inertia. Such advantages can be exploited only if model predictions can be relied u...
متن کامل